Conversation
…18767) - Add support for quantized clamp-type activations in the Cortex-M pipeline by canonicalizing relu/hardtanh/clamp to quantized aten.clamp.default for standalone int8 paths - Extend activation fusion to cover max_pool2d. @freddan80 @per @zingo @oscarandersson8218 @digantdesai @Sebastian-Larsson @AdrianLundell @psiddh cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell Signed-off-by: Xingguo Li <xingguo.li@arm.com>
…ch#18971) FuseConstantArgsPass resolved input_qparams by flattened input-node index, while FoldAndAnnotateQParamsPass stores them by top-level argument index. For aten.cat with a list-valued tensor argument, this caused only the first tensor to be dequantized before folding, which corrupted the fused constant. Resolve qparams by top-level argument index and propagate that qparam through nested list and tuple arguments. Add a regression test for quantized aten.cat constant folding with list-valued tensor inputs. Signed-off-by: Per Held <per.held@arm.com> Change-Id: I6e1a012d82a5dbeecb403c440a2944953dd5cba7
Fixes pytorch#10736 Formats `third-party/CMakeLists.txt` using `cmake-format` to improve readability and consistency. **Changes:** - Reformatted `ExternalProject_Add(...)` blocks for `flatbuffers` and `flatcc` - Reflowed `set_target_properties(...)`, `set(...)` cache variables, and `install(...)` calls - No functional changes — formatting only
All 4 tests failed because they called forward() with zero arguments on mobilenet_v2 which expects a [1,3,224,224] float input. This was a test bug, not a runtime bug. Add a dummyInput() helper that creates a Tensor.ones with the correct shape, and remove all @ignore annotations. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Differential Revision: D101887672 Pull Request resolved: pytorch#19035
Differential Revision: D102189156 Pull Request resolved: pytorch#19077
…rch#19092) Add cause-chaining constructor to ExecutorchRuntimeException so wrapped exceptions preserve the original cause in the stack trace. Restore detailed native error messages in LlmModule.load() — the null runner case now reports the model_type_category and valid values instead of a generic message. Load failures now throw from JNI with the specific error code and description. This commit was authored with the help of Claude.
…ytorch#18959) Summary: The CUDA runtime shims for sort operations use Half (float16) dtype, but it was not defined in the slim ScalarType enum, causing compiler warnings treated as errors (-Werror=switch). This adds proper Half support to the slim ScalarType enum so switch statements can use the enum value directly instead of casting to the underlying type. Differential Revision: D101218928
1. Attacker sets that flag on an external tensor.
2. xnnpack thinks the tensor is owned by itself, and frees it inside the
backend.
3. et runtime also frees it at method destruction.
Test Plan:
Build and run executor runner against problematic PTE file:
```
# Build executor runner:
cmake -B cmake-out \
-DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON
cmake --build cmake-out -j16 --target executor_runner
# Output
(executorch) [lfq@devvm11764.nha0 /data/users/lfq/security/executorch (f9f29e7)]$ ./cmake-out/executor_runner --model_path=/data/users/lfq/security/executorch_repros/TOB-EXECUTORCH-44.pte
```
Previous
```
(executorch) [lfq@devvm11764.nha0 /data/users/lfq/security/executorch (security44)]$ ./cmake-out/executor_runner --model_path=/data/users/lfq/security/executorch_repros/TOB-EXECUTORCH-44.pte
Note (XNNPACK): l1_data_cache_bytes=32768, l1_data_cache_line_size=64, l1_data_cache_associativity=8, l1_data_cache_num_sets=64. (init_hardware_config, /data/users/lfq/security/executorch/backends/xnnpack/third-party/XNNPACK/src/configs/hardware-config.c:417)
Note (XNNPACK): l2_data_cache_bytes=1048576, l2_data_cache_line_size=64, l2_data_cache_associativity=8, l2_data_cache_num_sets=2048. (init_hardware_config, /data/users/lfq/security/executorch/backends/xnnpack/third-party/XNNPACK/src/configs/hardware-config.c:436)
I 00:00:00.002612 executorch:cpuinfo_utils.cpp:71] Reading file /sys/devices/soc0/image_version
I 00:00:00.002640 executorch:cpuinfo_utils.cpp:87] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.002657 executorch:cpuinfo_utils.cpp:100] Reading file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.002664 executorch:cpuinfo_utils.cpp:109] Failed to open midr file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.002671 executorch:cpuinfo_utils.cpp:125] CPU info and manual query on # of cpus dont match.
I 00:00:00.002672 executorch:executor_runner.cpp:223] Resetting threadpool with num threads = 0
I 00:00:00.002722 executorch:executor_runner.cpp:374] Model file /data/users/lfq/security/executorch_repros/TOB-EXECUTORCH-44.pte is loaded.
I 00:00:00.002729 executorch:executor_runner.cpp:384] Using method forward
I 00:00:00.002739 executorch:executor_runner.cpp:435] Setting up planned buffer 0, size 112.
E 00:00:00.002806 executorch:XNNCompiler.cpp:331] Tensor value has unsupported flag bits 0xffffff00
E 00:00:00.002824 executorch:XNNPACKBackend.cpp:122] XNNCompiler::compileModel failed: 0x23
E 00:00:00.002827 executorch:method.cpp:127] Init failed for backend XnnpackBackend: 0x23
F 00:00:00.002830 executorch:executor_runner.cpp:459] In function main(), assert failed (method.ok()): Loading of method forward failed with status 0x23
Aborted (core dumped)
```
After, graceful error
```
(executorch) [lfq@devvm11764.nha0 /data/users/lfq/security/executorch (security44)]$ ./cmake-out/executor_runner --model_path=/data/users/lfq/security/executorch_repros/TOB-EXECUTORCH-44.pte
Note (XNNPACK): l1_data_cache_bytes=32768, l1_data_cache_line_size=64, l1_data_cache_associativity=8, l1_data_cache_num_sets=64. (init_hardware_config, /data/users/lfq/security/executorch/backends/xnnpack/third-party/XNNPACK/src/configs/hardware-config.c:417)
Note (XNNPACK): l2_data_cache_bytes=1048576, l2_data_cache_line_size=64, l2_data_cache_associativity=8, l2_data_cache_num_sets=2048. (init_hardware_config, /data/users/lfq/security/executorch/backends/xnnpack/third-party/XNNPACK/src/configs/hardware-config.c:436)
I 00:00:00.002562 executorch:cpuinfo_utils.cpp:71] Reading file /sys/devices/soc0/image_version
I 00:00:00.002595 executorch:cpuinfo_utils.cpp:87] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.002607 executorch:cpuinfo_utils.cpp:100] Reading file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.002618 executorch:cpuinfo_utils.cpp:109] Failed to open midr file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.002623 executorch:cpuinfo_utils.cpp:125] CPU info and manual query on # of cpus dont match.
I 00:00:00.002628 executorch:executor_runner.cpp:223] Resetting threadpool with num threads = 0
I 00:00:00.002672 executorch:executor_runner.cpp:374] Model file /data/users/lfq/security/executorch_repros/TOB-EXECUTORCH-44.pte is loaded.
I 00:00:00.002678 executorch:executor_runner.cpp:384] Using method forward
I 00:00:00.002688 executorch:executor_runner.cpp:435] Setting up planned buffer 0, size 112.
E 00:00:00.002750 executorch:XNNCompiler.cpp:331] Tensor value has unsupported flag bits 0xffffff00
E 00:00:00.002761 executorch:XNNPACKBackend.cpp:122] XNNCompiler::compileModel failed: 0x23
E 00:00:00.002769 executorch:method.cpp:127] Init failed for backend XnnpackBackend: 0x23
F 00:00:00.002772 executorch:executor_runner.cpp:459] In function main(), assert failed (method.ok()): Loading of method forward failed with status 0x23
```
Co-authored-by: Github Executorch <github_executorch@arm.com>
Co-authored-by: Claude <noreply@anthropic.com>
…M (v1) (pytorch#18859) The original SmolLM2 PR (pytorch#9354) started as v1 support, was renamed to `smollm2` during review, but the repo ID and `rope_theta` were never updated to v2 values. The two checkpoints are genuinely different models (0/272 tensors match). - `HUGGING_FACE_REPO_IDS["smollm2"]`: `HuggingFaceTB/SmolLM-135M` → `HuggingFaceTB/SmolLM2-135M` - `examples/models/smollm2/135M_config.json`: `rope_theta` `10000.0` → `100000.0` (matches [SmolLM2-135M HF config](https://huggingface.co/HuggingFaceTB/SmolLM2-135M/blob/main/config.json)) ### Test plan Data-only change (one string, one number). Verified values match the upstream HuggingFace SmolLM2-135M config.
Add tryTo accessors for each value. Previously, `toTensor` etc. abort with ET_CHECK_MSG when the type mismatches. API additions: - Per-type: tryToInt, tryToDouble, tryToBool, tryToScalar, tryToString, tryToTensor (already present, kept), tryToIntList, tryToBoolList, tryToDoubleList, tryToTensorList, tryToListOptionalTensor, tryToScalarType, tryToMemoryFormat, tryToLayout, tryToDevice. Tag mismatch returns Error::InvalidType; null list/string payload returns Error::InvalidState. - Templated tryTo<T>() dispatcher mirroring to<T>(), via a new EVALUE_DEFINE_TRY_TO macro kept adjacent to EVALUE_DEFINE_TO so drift between the two surfaces is visible at review time. - tryToOptional<T>() widened from Tensor-only to generic, delegating to tryTo<T>() so it works for any supported payload type. Tests cover success + mismatch paths for each new accessor, plus the widened tryToOptional<T>() path. Authored-with: Claude --------- Co-authored-by: Github Executorch <github_executorch@arm.com>
…rity (pytorch#18917) Differential Revision: D99769848 Pull Request resolved: pytorch#18917
…rch#19095) This PR makes GPU related operator cuda-backend specific, to bring metal qwen 3.5 moe ci back
Disable fusing of ops that have symbolic shapes as arguments. Also disable fusing of TOSA dialect ops. cc @digantdesai @freddan80 @per @zingo @mansnils @Sebastian-Larsson @robell Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
Adds util for computing a value range from a symbolic expression. cc @digantdesai @freddan80 @per @zingo @mansnils @Sebastian-Larsson @robell Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
The removed copy seems to be stale, it is never used.
…ch#18088) ## Summary This PR adds a fused `llama::recurrent_gated_delta_rule` custom op and wires Qwen3.5 GatedDeltaNet attention to use it instead of the Python per-token recurrence loop when the op is available. It also tightens local custom-op loading so we no longer implicitly scan repo-local `cmake-out*` directories, and adds coverage for recurrent-state correctness, chunked prefill behavior, and export graph selection. ## What changed - added `llama::recurrent_gated_delta_rule` runtime and AOT registrations - updated Qwen3.5 GatedDeltaNet attention to use the fused op with Python fallback preserved - tightened `custom_ops_aot_lib` discovery: - default to package-local discovery - allow explicit override via `EXECUTORCH_CUSTOM_OPS_AOT_LIB` - removed implicit repo-local `cmake-out*` scanning - added tests for: - recurrent op parity vs reference - `.out` variant behavior - chunked-state parity vs full-sequence execution - custom-op vs fallback attention parity - tiny Qwen3.5 export selecting `llama.recurrent_gated_delta_rule` ## Validation ### Linux CPU-only (aarch64) Built `custom_ops_aot_lib` successfully and loaded it via `EXECUTORCH_CUSTOM_OPS_AOT_LIB`. Passed: - `pytest extension/llm/custom_ops/test_update_cache.py::RecurrentGatedDeltaRuleTest -q` - `3 passed` - `pytest examples/models/llama/tests/test_qwen3_5_attention.py -q` - `7 passed` - `pytest examples/models/llama/tests/test_export_llama_lib.py::ExportLlamaLibTest::test_tiny_qwen35_export_uses_recurrent_gated_delta_rule -q` - `1 passed` ### Real-model CPU validation On a real `Qwen3.5-0.8B` CPU run, fused recurrence matched the fallback path on next-token selection with very small logit drift, and improved eager prefill latency on the tested prompt. Observed on local CPU validation: - same next token from fused path vs fallback - max logit diff on the order of `1e-5` - eager prefill speedup about `1.6x` on the tested prompt ### Windows note A local Windows-only FFHT/MSVC workaround was used during development to keep the local build usable, but that workaround is intentionally **not** included in this PR. ## Non-goals / separate issues I did not treat the local `program.fbs` serialization issue as part of this change. This branch does not modify `exir/_serialize/*` or `schema/program.fbs`, and serialization-focused checks passed on both this branch and clean `main` once the local environment was set up correctly. A separate end-to-end tiny Qwen3.5 `.pte` export probe hit: - `RuntimeError: Missing out variants: {'aten::alias'}` That appears to be a separate pre-existing export issue outside this change set. cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng --------- Co-authored-by: Digant Desai <digantdesai@meta.com> Co-authored-by: Nikhil Viswanath Sivakumar <68182521+nil-is-all@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.
[PLEASE REMOVE] If this PR closes an issue, please add a
Fixes #<issue-id>line.[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.